Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updates to deployment guides #3994

Merged
merged 24 commits into from
Oct 7, 2023
Merged

Updates to deployment guides #3994

merged 24 commits into from
Oct 7, 2023

Conversation

davidmirror-ops
Copy link
Contributor

@davidmirror-ops davidmirror-ops commented Aug 28, 2023

Tracking issue

Closes #3973

Describe your changes

  • Add references to tutorials in multiple places
  • Updates the Multicluster setup guide
  • Removes or rewrites sentences to better match current status

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Screenshots

Note to reviewers

Signed-off-by: davidmirror-ops <[email protected]>
This currently assumes that you have nginx ingress. We'll be updating these
in the near future to use the ALB ingress controller instead.
This section assumes that you're using the NGINX Ingress controller. Instructions and annotations for the ALB controller
are covered in the `Flyte The Hard Way <https://github.com/davidmirror-ops/flyte-the-hard-way/blob/main/docs/06-intro-to-ingress.md#setting-up-amazons-load-balancer-alb-ingress-controller>`__ tutorial.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI these were the final annotations that made things work correctly for me

{
					"alb.ingress.kubernetes.io/certificate-arn": helm.values.userSettings.certificateArn
					"alb.ingress.kubernetes.io/group.name":      "flyte"
					"alb.ingress.kubernetes.io/listen-ports":    '[{"HTTPS":443}]'
					"alb.ingress.kubernetes.io/scheme":          "internet-facing"
					"alb.ingress.kubernetes.io/ssl-redirect":    "443"
					"alb.ingress.kubernetes.io/target-type":     "ip"
					"kubernetes.io/ingress.class":               "alb"
					"alb.ingress.kubernetes.io/inbound-cidrs":   "xxx.xxx.xxx.xxx,xxx.xxx.xxx.xxx"
				}```

may interact with other parts of your stack.

In addition to searching and posting on the `Flyte Slack community <https://flyte-org.slack.com/archives/C01P3B761A6>`__,
Considering the above, we recommend checking out the `"Flyte The Hard Way" <https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main#flyte-the-hard-way>`__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general I would avoid pointing to an external guide, give also that the guide you are referring to is mostly referring to the single binary deployment, which might be confusing. I think all documentation should be in one place

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. A second iteration of updates to these guides should be pointed to extend the reach and make it more actionable. Also expanding the tutorial in the https://github.com/unionai-oss/deploy-flyte repo to cover flyte-core

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we forked your guide, @davidmirror-ops , and moved it to flyteorg?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@eapolinario I was thinking of something similar. Moving forward, the idea is that TF is the preferred approach but a manual guide is always a good resource

To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to admin (for example, within Lyft, `Confidant <https://github.com/lyft/confidant>`__ is used).
.. prompt:: bash $

kubectl create token <service-account-name> -n <namespace>
Copy link
Contributor

@gdabisias gdabisias Aug 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd expand on what service-account-name should be here.
If I understand correctly this is a separate service account, created in the data plane, which will be assume by the flyteadmin service account in the control plane to generate all the required resources in the data plane, right?


kubectl create token <service-account-name> -n <namespace>

To feed the credentials to FlyteAdmin, you must retrieve them from your new data plane cluster and upload them to ``FlyteAmin``.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not use the word "upload" which might be misleading. I'd say that you need update your control plane configuration with the new secrets

cluster_2_cacert: {{ cluster 2 cacert here }}
cluster_3_token: {{ cluster 3 token here }}
cluster_3_cacert: {{ cluster 3 cacert here }}
cluster_1_token: "cluster-1-token-here"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assuming that we have 3 dataplanes or 2? Since in the first case we only need 2 entries. I'd also specify how to change the names of the clusters

Signed-off-by: davidmirror-ops <[email protected]>
@davidmirror-ops davidmirror-ops marked this pull request as draft September 20, 2023 10:07
@@ -108,7 +85,7 @@ There are three different paths for deploying a Flyte cluster:
This option is appropriate if all your compute can `fit on one EKS cluster <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html>`__ .
As of this writing, a single Flyte cluster can handle more than 13,000 nodes.

Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required.
Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -tha main data plane component- can be sharded as well, if scale demands require it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: tha -> the

Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
@davidmirror-ops davidmirror-ops marked this pull request as ready for review September 29, 2023 15:58
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
:language: yaml
:lines: 127-135

.. group-tab:: ``flyte-binary`` on EKS using ALB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this also the one used for flyte-core using ALB? I'd add that too

@@ -49,29 +49,6 @@ deployment comes with a containerized `Minio <https://min.io/>`__, which offers
- **GCP**: `GCS <https://cloud.google.com/storage/>`__
- **Azure**: `Azure Blob Storage <https://azure.microsoft.com/en-us/products/storage/blobs>`__


Cluster Configuration
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing this section?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just not sure how it fits here. Additionally is not clear what specific K8s resources it refers to (besides namespaces for projects, which is only an example). So this relationship between Flyte objects and K8s resources is worth documenting, but I'm not sure it fits in this sort of pre-requisites section.

@@ -108,7 +85,7 @@ There are three different paths for deploying a Flyte cluster:
This option is appropriate if all your compute can `fit on one EKS cluster <https://docs.aws.amazon.com/eks/latest/userguide/service-quotas.html>`__ .
As of this writing, a single Flyte cluster can handle more than 13,000 nodes.

Whatever path you choose, note that ``FlytePropeller`` itself can be sharded as well, though typically it's not required.
Regardless of using single or multiple Kubernetes clusters for Flyte, note that ``FlytePropeller`` -the main data plane component- can be sharded as well, if scale demands require it.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear what the word "shared" means here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a reference to the sharding mechanism


.. tabbed:: AWS

1. An IAM Policy that defines the permissions needed for Flyte. A minimum set of permissions include:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You actually need 2 sets of policies, one for the control plane and one for the data plane

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you defining different permissions for control and data plane? The set of perms indicated here work for both

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ideally they should be different, but initially they may be the same

.. code-block:: json


2. At least three IAM Roles configured: one for the controlplane components, another for the dataplane
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controlplane -> control plane

namespace: flyte
type: Opaque
stringData:
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to put an actual secret here

cluster_3_cacert: {{ cluster 3 cacert here }}

Create cluster credentials secret in the control plane cluster.
stringData:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest to use directly encoded data here since the new lines are error prone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right.
So you mean also modifying the previous command to avoid the -D and also using a data section, right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes


Create cluster credentials secret in the control plane cluster.
stringData:
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an actual token

stringData:
dataplane_1_token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlM0WlhfMm1Yb1U4Z1V4R0t6STZDdkhGTVVvVDBZcDAxbjdVbDc1Y1VxR28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJmbHl0ZSIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXRhcGxhbmUxLXRva2VuIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQubmFtZSI6ImZseXRlYWRtaW4iLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkNTdhNjMwZi00ZTZmLTQzNTgtYjQwOS00M2UyMTlhYjg4NTEiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6Zmx5dGU6Zmx5dGVhZG1pbiJ9.Fbn5qJjWP1wyJ08PgZXnrrUdKEhLRYqUzG9Vff1maFO3yBKkv_EBuYc2hjGeW5_ORCrT9qKcFAd3AE_tM3P8AQ-dRoA6K-RcJ2qinxabWmk9RYbtKFr1zujswU6dm-iB7JkjY7yYyBRewbw_m4QRacgG8K11c8bYZ9SZoV86EqGmsNdeCPuv5GiPBiJ0p3hgta4kZ1knCNf8qLBUQVZ-9G5vabYM0lyD6dvGOqlOs1bMzgLeijvpQN471dTLmIZ71anOG2gkuJW_AusnWDF_0rJ3yfISf3dRkhXkLswyq-awgtKbz6ZYjPaJ1eA8dNvSlbDoNrMXOGNlx7p7KhOY-w
dataplane_1_cacert: |
-----BEGIN CERTIFICATE-----
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for an actual cert

team1:
- id: cluster_1
label1:
- id: dataplane_1
weight: 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make sure to explain here what these weight values are

Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
Signed-off-by: davidmirror-ops <[email protected]>
@davidmirror-ops
Copy link
Contributor Author

@gdabisias Thanks for your careful reviews!


.. code-block:: yaml

configmap:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you also need the configuration for datacatalog; in my case I had to also add the endpoint for that.
Note that this should be moved to the default charts and not left to the user


"oidc.eks.<AWS-REGION-CODE>.amazonaws.com/id/<DATAPLANE1-OIDC-PROVIDER>:sub": [
"system:serviceaccount:flyte:flytepropeller",
"system:serviceaccount:*:default"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd explain here that ideally we would add only the namespaces needed, i.e. project-env, so that users can be more specific if they want

Signed-off-by: davidmirror-ops <[email protected]>
Copy link
Contributor

@eapolinario eapolinario left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, left just a couple of comments.

@@ -132,7 +132,7 @@ ingress:
nginx.ingress.kubernetes.io/app-root: /console
grpcAnnotations:
nginx.ingress.kubernetes.io/backend-protocol: GRPC
host: development.uniondemo.run
host: development.uniondemo.run # change for the URL you'll use to connect to Flyte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make this an inexistent url? Maybe rename this to '<your.flyte.url>' ?

@@ -41,7 +41,7 @@ Requirements
- Install `docker <https://docs.docker.com/engine/install/>`__ or any other OCI-compatible tool, like Podman or LXD.
- Install `flytectl <https://github.com/flyteorg/flytectl>`__, the official CLI for Flyte.

While Flyte can run any OCI-compatible task image, using the default Kubernetes container runtime (cri-o), the Flyte
While Flyte can run any OCI-compatible task image using the default Kubernetes container runtime (cri-o), the Flyte
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't the default k8s container runtime containerd?

may interact with other parts of your stack.

In addition to searching and posting on the `Flyte Slack community <https://flyte-org.slack.com/archives/C01P3B761A6>`__,
Considering the above, we recommend checking out the `"Flyte The Hard Way" <https://github.com/davidmirror-ops/flyte-the-hard-way/tree/main#flyte-the-hard-way>`__ set of community-maintained tutorials that can guide you through the process of preparing the infrastructure and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if we forked your guide, @davidmirror-ops , and moved it to flyteorg?

@gdabisias
Copy link
Contributor

Looks all good :)
Please remember to open issues for the points that should not be part of the guide, but should be directly part of the deployment charts, so that we don't forget

@davidmirror-ops davidmirror-ops merged commit 8d66d05 into flyteorg:master Oct 7, 2023
16 checks passed
@davidmirror-ops davidmirror-ops deleted the deployment-guide-updates branch October 7, 2023 12:41
@davidmirror-ops davidmirror-ops restored the deployment-guide-updates branch October 7, 2023 12:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Docs] Command to upgrade multi-cluster deployment is malformed
3 participants